Dynamic application instance placement in data center environments

ABSTRACT

Techniques are disclosed for determining placements of application instances on computing resources in a computing system such that the application instances can be executed thereon. By way of example, a method for determining an application instance placement in a set of machines under one or more resource constraints includes the following steps. An estimate is computed of a value of the first metric that can be achieved by a current application instance placement and a current application load distribution. A new application instance placement and a new application load distribution are determined, wherein the new application instance placement and the new load distribution optimize the first metric.

FIELD OF THE INVENTION

The present invention generally relates to computing systems and, moreparticularly, to techniques for determining placements of applicationinstances on computing resources in a computing system such that theapplication instances can be executed thereon.

BACKGROUND OF THE INVENTION

With the rapid growth of the Internet, many organizations increasinglyrely on web (i.e., World Wide Web) applications to deliver criticalservices to their customers and partners. An “application” generallyrefers to software code (e.g., one or more programs) which perform oneor more functions.

Over the course of a decade, web applications have evolved from theearly HyperText Transport Protocol (HTTP) servers that only deliverstatic HyperText Markup Language (HTML) files, to the current ones thatrun in sophisticated distributed environments, e.g., Java 2 EnterpriseEdition (J2EE), and provide a diversity of services such as onlineshopping, online banking, and web search. Modern Internet data centersmay run thousands of machines to host a large number of different webapplications. Many web applications are resource demanding and processclient requests at a high rate. Previous studies have shown that the webrequest rate is bursty in nature and can fluctuate dramatically in ashort period of time. Therefore, it is not cost-effective to overprovision data centers in order to handle the potential peak demands ofall the applications.

To utilize system resources more effectively, modern web applicationstypically run on top of a middleware system and rely on it todynamically allocate resources to meet the applications' performancegoals. “Middleware” generally refers to the software layer that liesbetween the operating system and the applications. Some middlewaresystems use a clustering technology to improve scalability, availabilityand load balancing, by integrating multiple instances of the sameapplication, and presenting them to the users as a single virtualapplication.

SUMMARY OF THE INVENTION

Principles of the invention provide techniques for determiningplacements of application instances on computing resources in acomputing system such that the application instances can be executedthereon.

By way of example, in one aspect of the invention, a method fordetermining an application instance placement in a set of machines underone or more resource constraints includes the following steps. Anestimate is computed of a value of the first metric that can be achievedby a current application instance placement and a current applicationload distribution. A new application instance placement and a newapplication load distribution are determined, wherein the newapplication instance placement and the new application load distributionoptimize the first metric.

The determining step may further include the new application instanceplacement improving upon the first metric and the new load distributionimproving upon a second metric. The determining step may further includeshifting an application load, changing the application instanceplacement without pinning to determine a first candidate placement,changing the application instance placement with pinning to determine asecond candidate placement, and selecting a best placement from thefirst candidate placement and the second candidate placement as the newapplication instance placement. The determining step may be performedmultiple times.

The method may also include the step of balancing an application loadacross the set of machines.

The first metric may include a total number of satisfied demands, atotal number of placement changes, or an extent to which an applicationload is balanced across the set of machines.

One of the one or more resource constraints may include a processingcapacity or a memory capacity.

The second metric may include a degree of correlation between residualresources on each machine of the set of machines, or a number ofunderutilized application instances.

These and other objects, features and advantages of the presentinvention will become apparent from the following detailed descriptionof illustrative embodiments thereof, which is to be read in connectionwith the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of clustered web applications, accordingto an embodiment of the invention.

FIG. 2 illustrates a control loop and system for solving an applicationplacement problem, according to an embodiment of the invention.

FIG. 3 illustrates symbols used in a description of an applicationplacement problem, according to an embodiment of the invention.

FIG. 4 illustrates a high-level pseudo code implementation of anapplication placement algorithm, according to an embodiment of theinvention.

FIG. 5 illustrates a max-flow problem for use in solving an applicationplacement problem, according to an embodiment of the invention.

FIG. 6 illustrates a pseudo code implementation of an applicationplacement algorithm, according to an embodiment of the invention.

FIG. 7 illustrates a pseudo code implementation of placement changingfunction, according to an embodiment of the invention.

FIG. 8 illustrates a pseudo code implementation of load shiftingfunction, according to an embodiment of the invention.

FIG. 9 illustrates a graphical user interface for use with anapplication placement algorithm, according to an embodiment of theinvention.

FIG. 10 illustrates a computing system for implementing an applicationplacement algorithm, according to an embodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Illustrative principles of the invention will be explained below in thecontext of an Internet-based/web application environment. However, it isto be understood that the present invention is not limited to such anenvironment. Rather, the invention is more generally applicable to anydata processing environment in which it would be desirable to provideimproved processing performance.

In the illustrative description below, the following problem isaddressed. Given a set of machines (computing systems or servers) and aset of web applications with dynamically changing demands (e.g., thenumber of client requests for use of the application), an applicationplacement controller decides how many instances to run for eachapplication and where to put them (i.e., which machines to assign themto), while observing a variety of resource constraints. “Instances” ofan application generally refer to identical copies of the application,but can also refer to different or even overlapping parts of theapplication. This problem is considered non-deterministicpolynomial-time (NP) hard. Illustrative principles of the inventionpropose an online algorithm that uses heuristics to efficiently solvethis problem. The algorithm allows multiple applications to share asingle machine, and strives to maximize the total satisfied applicationdemand, to minimize the number of application starts and stops, and tobalance the load across machines. It is to be understood that reasonableextensions of the proposed algorithm can also optimize for otherperformance goals, for example, maximize or minimize certain userspecified utility functions.

FIG. 1 is an example of clustered web applications. System 100 includesone front-end request router 102, three back-end computing nodes 104 (A,B, and C), and three applications 106 (x, y, and z). The applications,for example, can be a catalog search application, an order processingapplication, and an account management application, for an onlineshopping site. Request router 102 receives external requests (fromclient devices, not shown) and forwards them to the appropriateinstances of the three applications (106-x, 106-y, and 106-z). Toachieve the quality of service (QoS) goals of the applications, therequest router may implement functions such as admission control, flowcontrol, and load balancing.

Flow control and load balancing decide how to dynamically allocateresources to the running application instances. Illustrative principlesof the invention address an equally important problem. That is, given aset of machines with constrained resources and a set of web applicationswith dynamically changing demands, we determine how many instances torun for each application and what machine to execute them on.

We call this problem dynamic application placement. We assume that notevery machine can run all the applications at the same time due tolimited resources such as memory.

Application placement is orthogonal to flow control and load balancing,and the quality of a placement solution can have profound impacts on theperformance of the entire system (i.e., the complete set of machinesused for hosting applications). In FIG. 1, suppose the request rate forapplication z suddenly surges. Application z may not meet the demandseven if all the resources of machine C are allocated to application z. Amiddleware system then may react by stopping application x on machines Aand B, and using the freed resources (e.g., memory) to start an instanceof application z on both A and B.

We illustratively formulate the application placement problem as avariant of the Class Constrained Multiple-Knapsack Problem (see, e.g.,H. Shachnai and T. Tamir, “Noah's bagels—some combinatorial aspects,” InProc. 1st Int. Conf. on Fun with Algorithms, 1998; and H. Shachnai andT. Tamir, “On two class-constrained versions of the multiple knapsackproblem,” Algorithmica, 29(3), pp. 442-467, 2001). Under multipleresource constraints (e.g., CPU and memory) and application constraints(e.g., the need for special hardware or software), an automatedplacement algorithm strives to produce placement solutions that optimizemultiple objectives: (1) maximizing the total satisfied applicationdemand, (2) minimizing the total number of application starts and stops,and (3) balancing the load across machines. It is to be understood thatwe can also optimize for other objective functions, for example, a userspecified utility function.

The placement problem is NP hard. In one embodiment, the inventionprovides an online heuristic algorithm that can produce within 30seconds high-quality solutions for hard placement problems withthousands of machines and thousands of application. This scalability iscrucial for dynamic resource provisioning in large-scale enterprise datacenters. Compared with existing algorithms, for systems with 100machines or less, the proposed algorithm is up to 134 times faster,reduces the number of application starts and stops by up to a factor of32, and satisfies up to 25% more application demands.

The remainder of the detailed description is organized as follows.Section I formulates the application placement problem. Section IIdescribes an illustrative placement algorithm.

I. Problem Formulation

FIG. 2 is a diagram of a control loop and system 200 for solving theapplication placement problem. For brevity, we simply refer to“application placement” as “placement” in the following illustrativedescription. Placement controller 202 is the main placement processingcomponent of the control loop. The set of machines (data center) 203includes the machines for which placement controller 202 determinesapplication placement.

Inputs 204 to placement controller 202 include the current placement ofapplications on machines (matrix I), the resource capacity of eachmachine (CPU capacity vector Ω and memory capacity vector Γ), theprojected resource demand of each application (CPU demand vector ω andmemory demand vector γ), and the restrictions that specify whether agiven application can run on a given machine (matrix R), e.g., someapplication may require machines with special hardware or software. Itis to be appreciated that such inputs are collected by auxiliarycomponents. That is, placement sensor 205 generates and maintainscurrent placement matrix I. Application demand estimator 206 generatesand maintains the projected resource demand of each application (CPUdemand vector ω and memory demand vector γ). Configuration database 207maintains the resource capacity of each machine (CPU capacity vector Ωand memory capacity vector Γ).

Taking inputs 204, placement controller 202 generates outputs 208including new placement matrix I and load distribution matrix L. Thatis, placement controller 202 computes a new placement solution (newmatrix I) that optimizes certain objective functions, and then passesthe solution to placement executor 209 to start and stop applicationinstances accordingly. The placement executor schedules placementchanges in such a way that they impose minimum disturbances to therunning system. Periodically every T minutes, the placement controllerproduces a new placement solution based on the current inputs. By way ofexample only, T=15 minutes may be a default configuration.

Estimating application demands is a non-trivial task. In one embodiment,we use online profiling and linear regression to dynamically estimatethe average CPU cycles needed to process one web request for a givenapplication. The product of the estimated CPU cycles per request and theprojected request rate gives the CPU cycles needed by the applicationper second. However, it is to be understood that other known techniquesfor estimating application demand may be used.

The remainder of this section presents the formal formulation of theillustrative placement problem. We first discuss the system resourcesand application demands considered in the placement problem. Anapplication's demands for resources can be characterized as eitherload-dependent or load-independent. A running application instance'sconsumption of load-dependent resources depends on the request rate.Examples of such resources include CPU cycles and network bandwidth. Arunning application instance also consumes some load-independentresources regardless of the offered load, i.e., even if it processes norequests. An example of such resources is the process control block(PCB) maintained in the operating system kernel for each runningprogram.

In this embodiment, for practical reasons, we treat memory as aload-independent resource, and conservatively estimate the memory usageto ensure that every running application has sufficient memory. It isassumed that the system includes a component that dynamically estimatesthe upper limit of an application's near-term memory usage based on atime series of its past memory usage. Because the memory usageestimation is updated dynamically, some load-dependent aspects of memoryare indirectly considered by the placement controller.

We treat memory as a load-independent resource for several reasons.First, a significant amount of memory is consumed by an applicationinstance even if it receives no requests. Second, memory consumption isoften related to prior application usage rather than its current load.For example, even in the presence of a low load, memory usage may stillbe high as a result of data caching. Third, because an accurateprojection of future memory usage is extremely difficult and manyapplications cannot run when the system is out of memory, it is morereasonable to be conservative in the estimation of memory usage, i.e.,using the upper limit instead of the average.

Among many load-dependent and load-independent resources, we choose CPUand memory as the representative ones to be considered by the placementcontroller, because we observe that they are the most common bottleneckresources. For example, our experience shows that many business J2EEapplications require on average 1-2 GB (gigabyte) real memory to run.For brevity, the description of the algorithm only considers CPU andmemory, but it is to be understood that the algorithm can consider othertypes of resources as well. For example, if the system isnetwork-bounded, we can use network bandwidth as the load-dependentresource, which introduces no changes to the algorithm.

Next, we present the formal formulation of the placement problem. FIG. 3lists the symbols used in the description. The inputs to the placementcontroller are the current placement matrix I, the placement restrictionmatrix R, the CPU and memory capacity of each machine (Ω_(n) and Γ_(n)),the CPU and memory demand of each application (ω_(m), and γ_(m)). Notethat ω_(m) is application m's aggregated CPU demand throughout theentire system (i.e., the complete set of machines used for hostingapplications), while γ_(m) is the memory requirement to run one instanceof application m. Due to special hardware or software requirements, anapplication m may not be able to run on a machine n. This placementrestriction is represented as R_(m,n)=0.

The outputs 208 of placement controller 202 are the updated placementmatrix I and the load distribution matrix L. Placement executor 209starts and stops application instances according to the differencebetween the old and new placement matrices. The load distribution matrixL is a byproduct. It helps verify the maximum total application demandthat can be satisfied by the new placement matrix I. L may or may not bedirectly used by the placement executor or the request router. Therequest router may dynamically balance the load according to the realreceived demands rather than the load distribution matrix L computedbased on the projected demands.

Placement controller 202 strives to find a placement solution thatmaximizes the total satisfied application demand. Again, it is to beunderstood that this is just one example of the optimization goal. Thatis, principles of the invention may also be used to optimize for otherobjective functions instead of maximizing the total satisfied demand,for example, maximize certain user-specified utility function. Inaddition, the placement controller also tries to minimize the totalnumber of application starts and stops, because placement changesdisturb the running system and waste CPU cycles. In practice, many J2EEapplications take a few minutes to start or stop, and take someadditional time to warm up their data cache. The last optimization goalis to balance the load across machines. Ideally, the utilization ofindividual machines should stay close to the utilization p of the entiresystem:

$\begin{matrix}{\rho = \frac{\Sigma_{m \in M}\Sigma_{n \in N}L_{m,n}}{\Sigma_{n \in N}\Omega_{n}}} & (1)\end{matrix}$

As we are dealing with multiple optimization objectives, we prioritizethem in the formal problem statement below. Let I* denote the oldplacement matrix, and I denote the new placement matrix:

$\begin{matrix}{(i)\mspace{14mu} {maximum}\mspace{14mu} {\sum\limits_{m \in M}{\sum\limits_{n \in N}L_{m,n}}}} & (2) \\{({ii})\mspace{14mu} {minimize}\mspace{14mu} {\sum\limits_{m \in M}{\sum\limits_{n \in N}{{I_{m,n} - I_{m,n}^{*}}}}}} & (3) \\{{({iii})\mspace{14mu} {minimize}\mspace{14mu} {\sum\limits_{n \in N}{{\frac{\Sigma_{m \in M}L_{n,m}}{\Omega_{n}} - \rho}}}}{{such}\mspace{14mu} {that}}} & (4) \\\begin{matrix}{{\forall{m \in M}},{\forall{n \in N}}} & {I_{m,n} = {{0\mspace{14mu} {or}\mspace{14mu} I_{m,n}} = 1}}\end{matrix} & (5) \\\begin{matrix}{{\forall{m \in M}},{\forall{n \in N}}} & {R_{m,n} = {\left. 0\Rightarrow I_{m,n} \right. = 0}}\end{matrix} & (6) \\\begin{matrix}{{\forall{m \in M}},{\forall{n \in N}}} & {I_{m,n} = {\left. 0\Rightarrow L_{m,n} \right. = 0}}\end{matrix} & (7) \\\begin{matrix}{{\forall{m \in M}},{\forall{n \in N}}} & {L_{m,n} \geq 0}\end{matrix} & (8) \\\begin{matrix}{\forall{n \in N}} & {{\sum\limits_{m \in M}{\gamma_{m}I_{m,n}}} \leq \Gamma_{n}}\end{matrix} & (9) \\\begin{matrix}{\forall{n \in N}} & {{\sum\limits_{m \in M}L_{m,n}} \leq \Omega_{n}}\end{matrix} & (10) \\\begin{matrix}{\forall{n \in M}} & {{\sum\limits_{n \in N}L_{m,n}} \leq w_{m}}\end{matrix} & (11)\end{matrix}$

As mentioned above, this optimization problem is a variant of the ClassConstrained Multiple-Knapsack problem. It differs from the priorformulation mainly in that it also minimizes the number of placementchanges. This problem is NP hard. In the next section, we present anonline heuristic algorithm for solving the optimization problem.

II. Placement Algorithm

This section describes an illustrative embodiment of a placementalgorithm, which can efficiently find high-quality placement solutionseven under tight resource constraints. FIG. 4 shows a high-level pseudocode implementation of a placement algorithm according to an embodimentof the invention. A more complete version is illustrated in FIGS. 6, 7and 8.

The core of the place ( ) function is a loop that incrementallyoptimizes the placement solution. Inside the loop, the algorithm firstsolves the max-flow problem (see, e.g., R. K. Ahuja, T. L. Magnanti, andJ. B. Orlin, editors, “Network Flows: Theory, Algorithms, andApplications,” Prentice Hall, N.J., 1993, ISBN 1000499012) in FIG. 5 tocompute the maximum total demand that can be satisfied by the currentplacement matrix. The algorithm then invokes the load_shifting ( )subroutine to move load among machines (without any placement changes)in preparation for subsequent placement changes. Finally, the algorithminvokes the placement_changing ( ) subroutine to start or stopapplication instances in order to increase the total satisfiedapplication demand. Note that “placement change” and “load shifting” inthe algorithm description are all hypothetical. The real placementchanges are executed after the placement algorithm finishes. The outputsof the placement algorithm are the updated placement matrix I and thenew load distribution matrix L. The load_shifting ( ) subroutinemodifies only L whereas the placement_changing ( ) subroutine modifiesboth I and L.

Below, we first define some terms that will be used in the algorithmdescription (subsection A), and then generally describe key concepts ofthe algorithm (subsections B and C). Finally, we describe in detail theload-shifting subroutine (subsection D), the placement-changingsubroutine (subsection E), and the full placement algorithm (subsectionF) that invokes the two subroutines.

A. Definition of Terms

A machine is fully utilized if its residual CPU capacity is zero(Ω*_(n)=0); otherwise, it is underutilized. An application instance isfully utilized if it runs on a fully utilized machine. An instance ofapplication m running on an underutilized machine n is completely idleif it has no load (L_(m,n)=0); otherwise, it is underutilized. The loadof an underutilized instance of application m can be increased ifapplication m has a positive residual CPU demand (ω*_(m)>0). Note thatthe definition of a machine's utilization is solely based on its CPUusage.

The CPU-memory ratio of a machine n is defined as its CPU capacitydivided by its memory capacity, i.e., Ω_(n)/Γ_(n). Intuitively, it isharder to fully utilize the CPU of machines with a high CPU-memoryratio. The load-memory ratio of an instance of application m running onmachine n is defined as the CPU load of this instance divided by itsmemory consumption, i.e., L_(m,n)/γ_(m). Intuitively, applicationinstances with a higher load-memory ratio are more useful.

B. Load Shifting

Solving the max-flow problem in FIG. 5 gives the maximum total demand ŵthat can be satisfied by the current placement matrix I. Among manypossible load distribution matrices L that can meet this maximum demandŵ, we employ several load-shifting heuristics to find the one that makeslater placement changes easier.

We classify the running instances of an application into threecategories: idle, underutilized, and fully utilized. The idle instancesare preferred candidates to be shut down. We opt for leaving the fullyutilized instances intact.

Through proper load shifting, we can ensure that every application hasat most one underutilized instance in the entire system. Reducing thenumber of underutilized instances simplifies the placement problem,because the heuristics to handle idle instances and fully utilizedinstances are straightforward. The issue of load balancing will beaddressed separately in a later stage of the algorithm.

We strive to co-locate the residual memory and the residual CPU on thesame machines so that the residual resources can be used to start newapplication instances. For example, if one machine has only residual CPUand another machine has only residual memory, neither of them can acceptnew applications.

We strive to make idle application instances appear on the machines withmore residual memory. By shutting down the idle instances, more memorywill become available for hosting applications with a high memoryrequirement.

C. Placement Changing

The load_shifting ( ) subroutine prepares the load distribution in a waythat makes later placement changes easier. The placement_changing ( )subroutine further employs several heuristics to increase the totalsatisfied application demand, to reduce placement changes, and to reducecomputation time.

The algorithm walks through the underutilized machines sequentially andmakes placement changes to them one by one in an isolated fashion. Whenworking on a machine n, the algorithm is only concerned with the stateof machine n and the residual application demands. The states of othermachines do not directly affect the current decision to be made formachine n. Moreover, once the applications to run on machine n aredecided, later placement changes on other machines will not affect thedecision already made for machine n. This isolation of machinesdramatically reduces the complexity of the algorithm.

The isolation of machines, however, may lead to inferior placementsolutions. We address this problem by alternately executing theload-shifting subroutine and the placement-changing subroutine formultiple rounds. As a result, the residual application demands releasedfrom the application instances stopped in the previous round now havethe opportunity to be allocated to other machines in the later rounds.

When sequentially walking through the underutilized machines, thealgorithm considers machines with a relatively high CPU-memory ratiofirst. Because it is harder to fully utilize these machines' CPU, weprefer to process them first when we still have abundant options.

When considering the applications to run on a machine, the algorithmtries to find a combination of applications that lead to the highest CPUutilization of this machine. It prefers to stop the running applicationinstances with a relatively low load-memory ratio in order toaccommodate new application instances.

To reduce placement changes, the algorithm does not allow stoppingapplication instances that already deliver a sufficiently high load. Werefer to these instances as pinned instances. The intuition is that,even if we stop these instances on their hosting machines, it is likelythat we will start instances of the same applications on other machines.The algorithm dynamically computes the pinning threshold for eachapplication.

D. Load-Shifting Subroutine

Given the current application demands, the placement algorithm solves amax-flow problem to derive the maximum total demand that can besatisfied by the current placement matrix I. FIG. 5 is an example ofthis max-flow problem, in which we consider four applications (w, x, y,and z) and three machines (A, B, and C). Each application is representedas a node in the graph. Each machine is also represented as a node. Inaddition, there are a source node and a sink node. The source node hasan outgoing link to each application m, and the capacity of the link isthe CPU demand of the application (ω_(m)). Each machine n has anoutgoing link to the sink node, and the capacity of the link is the CPUcapacity of the machine (Ω_(n)). The last set of links are between theapplications and the machines that currently run those applications. Thecapacity of these links is unlimited. In FIG. 5, application x currentlyruns on machines A and B. Therefore, x has two outgoing links: x→A andx→B.

When the load distribution problem is formulated as this max-flowproblem, the maximum volume of flows going from the source node to thesink node is the maximum total demand ŵ that can be satisfied by thecurrent placement matrix I. Efficient algorithms to solve max-flowproblems are well known (see, e.g., R. K. Ahuja, T. L. Magnanti, and J.B. Orlin, editors, “Network Flows: Theory, Algorithms, andApplications,” Prentice Hall, N.J., 1993, ISBN 1000499012). If ŵ equalsto the total application demand, no placement changes are needed.Otherwise, some placement changes are made in order to satisfy moredemands. Before doing so, the load distribution matrix L produced bysolving the max-flow problem in FIG. 5 is first adjusted. A goal of thisload shifting process is to achieve the effects described above, forexample, co-locating the residual CPU and the residual memory on themachines.

The task of load shifting is accomplished by solving the min-costmax-flow problem in FIG. 5. We sort all the machines in increasing orderof residual memory capacity Γ*_(n), and associate each machine n with arank r_(n) that reflects its position in this sorted list. The machinewith rank 0 has the smallest residual memory. In FIG. 5, the linkbetween a machine n and the sink node is associated with the cost r_(n).The cost of all the other links is zero, which is not shown in thefigure for brevity. In this example, machine C has more residual memorythan machine A, and machine A has more residual memory that machine B.Therefore, the links between the machines and the sink node have costsr_(B)=0, r_(A)=1, and r_(c)=2 respectively.

The load distribution matrix L produced by solving the min-cost max-flowproblem in FIG. 5 has the following properties: (1) an application hasat most one underutilized instance in the entire system; (2) theresidual memory and the residual CPU are likely to co-locate on the samemachines; and (3) the idle application instances appear on the machineswith relatively more residual memory. That is, in the load distributionmatrix L produced by solving the min-cost max-flow problem in FIG. 5, anapplication has at most one underutilized instance in the entire system.Furthermore, in the load distribution matrix L produced by solving themin-cost max-flow problem in FIG. 5, if application m has oneunderutilized instance running on machine n, then (1) application m'sidle instances must run on machines whose residual memory is larger thanor equal to that of machine n; and (2) application m's fully utilizedinstances must run on machines whose residual memory is smaller than orequal to that of machine n. It is to be appreciated that theseproperties make later placement changes easier.

E. Placement-Changing Subroutine

The placement-changing subroutine takes as input the current placementmatrix I, the load distribution matrix L generated by the load-shiftingsubroutine, and the residual application demands not satisfied by L. Ittries to increase the total satisfied application demand by making someplacement changes, for instance, stopping idle application instances andstarting useful ones. Again, note that the “placement changes” in thealgorithm description are all hypothetical.

As shown in FIG. 4, the main structure of the placement-changingsubroutine includes three nested loops. The outermost loop iterates overthe machines and asks the intermediate loop to generate a placementsolution for one machine n at a time. Suppose machine n currently runs cnot-pinned application instances (M₁, M₂, . . . , M_(c)) sorted inincreasing order of load-memory ratio. The intermediate loop iteratesover a variable j (0≦j≦c). In iteration j, it stops on machine n the japplications (M₁, M₂, . . . , M_(j)) while keeping the other runningapplications intact, and then asks the innermost loop to findappropriate applications to consume machine n's residual resources. Theinnermost loop walks through the residual applications, and identifiesthose that can fit on machine n. As the intermediate loop varies thenumber of stopped applications from 0 to c, it collects c+1 differentplacement solutions for machine n, among which it picks the best one asthe final solution.

In the rest of this subsection, we describe the three nested loops inmore detail.

The Outermost Loop. Before entering the outermost loop, the algorithmfirst computes the residual CPU demand of each application. We refer tothe applications with a positive residual CPU demand (i.e., w*_(m)>0) asresidual applications. The algorithm inserts all the residualapplication into a right-threaded AVL (Adelson-Velsky Landis) treecalled residual_app_tree. The applications in the tree are sorted indecreasing order of residual demand. As the algorithm progresses, theresidual demand of applications may change, and the tree is updatedaccordingly. The algorithm also keeps track of the minimum memoryrequirement γ_(min) of applications in the tree,

$\begin{matrix}{{\gamma_{\min} = {\min\limits_{m \in \; {{residual\_ app}{\_ tree}}}\gamma_{m}}},} & (12)\end{matrix}$

where γ_(m) is the memory needed to run one instance of application m.The algorithm uses γ_(m) to speedup the computation in the innermostloop. If a machine n's residual memory is smaller than Γ_(min) (i.e.,Γ*_(n)<γ_(min)), the algorithm can immediately infer that this machinecannot accept any applications in the residual_app_tree.

The algorithm excludes fully utilized machines from the consideration ofplacement changes, and sorts the underutilized machines in decreasingorder of CPU-memory ratio. Starting from the machine with the highestCPU-memory ratio, it enumerates each underutilized machine, and asks theintermediate loop to compute a placement solution for the machine.Because it is harder to fully utilize the CPU of machines with a highCPU-memory ratio, we prefer to process them first when we still haveabundant options.

The Intermediate Loop. Taking as input the residual_app_tree and amachine n given by the outermost loop, the intermediate loop computes aplacement solution for machine n. Suppose machine n currently runs cnot-pinned application instances. Application instance pinning isdescribed below. We can stop a subset of the c applications, and use theresidual resources to run other applications. In total, there are 2^(c)cases to consider. We use a heuristic to reduce this number to c+1.Intuitively, we prefer to stop the less “useful” application instances,i.e., those with a low load-memory ratio (L_(m,n)/γ_(m))

The algorithm first sorts the not-pinned application instances onmachine n in increasing order of load-memory ratio. Let (M₁, M₂, . . . ,M_(c)) denote this sorted list. The intermediate loop iterates over avariable j (0≦j≦c). In iteration j, it stops on machine n the japplications (M₁, M₂, . . . , M_(j)) while keeping the other runningapplications intact, and then asks the innermost loop to findappropriate applications to consume machine n's residual resources thatbecome available after stopping the j applications. As the intermediateloop varies the number of stopped applications from 0 to c, it collectsc+1 placement solutions, among which it picks as the final solution theone that leads to the highest CPU utilization of machine n.

We illustrate this through an example. Suppose machine n currently runsthree not-pinned application instances (M₁, M₂, M₃) sorted in increasingorder of load-memory ratio. Intuitively, M₃ is more useful than M₂, andM₂ is more useful than M₁. The algorithm tries four placement solutions.In solution 1, it stops none of M₁, M₂, and M₃. In solution 2, it stopsM₁ but keeps M₂ and M₃. In solution 3, it stops M₁ and M₂,but keeps M₃.In solution 4, it stops M₀, M₁, and M₂ . For each solution, theinnermost loop finds appropriate applications to consume machine n'sresidual resources that become available after stopping theapplications. Among the four solutions, the algorithm picks the best oneas the final solution.

The Innermost Loop. The intermediate loop changes the number ofapplications to stop. The innermost loop uses machine n's residualresources to run some residual applications. Recall that theresidual_app_tree is sorted in decreasing order of residual CPU demand.The innermost loop iterates over the residual applications, startingfrom the one with the largest residual demand. When an application m isunder consideration, it checks two conditions: (1) if the restrictionmatrix R allows application m to run on machine n, and (2) if machine nhas sufficient residual memory to host application m, (i.e.,γ_(m)≦Γ*_(n)). If both conditions are satisfied, it places application mon machine n, and assigns as much load as possible to this instanceuntil either machine n's CPU is fully utilized or application m has noresidual demand. After this allocation, application m's residual demandchanges, and the residual_app_tree is updated accordingly.

The algorithm loops over the residual applications until either: (1) allthe residual applications have been considered once; or (2) machine n'sCPU becomes fully utilized; or (3) machine n's residual memory isinsufficient to host any residual application (i.e., Γ*_(n)<γ_(min), seeEquation 12). Typically, after hosting a few residual applications,machine n's residual memory quickly becomes too small to host moreresidual applications. Therefore, the third condition helps reducecomputation time.

F. Full Placement Algorithm

While the placement algorithm is outlined in FIG. 4, a full placementalgorithm is illustrated in detail in FIGS. 6 through 8. Namely, FIG. 6illustrates pseudo code for the place function, FIG. 7 illustratespseudo code for the placement changing function, and FIG. 8 illustratespseudo code for the load shifting function.

The placement algorithm incrementally optimizes the placement solutionin multiple rounds. In one round, it first invokes the load-shiftingsubroutine and then invokes the placement-changing subroutine. Itrepeats for up to K rounds, but quits earlier it sees no improvement inthe total satisfied application demand after one round of execution. Thelast step of the algorithm balances the load across machines. By way ofexample only, we use the load-balancing component from an exitingalgorithm (A. Karve, T. Kimbrel, G. Pacifici, M. Spreitzer, M. Steinder,M. Sviridenko, and A. Tantawi, “Dynamic Application Placement forClustered Web Applications,” In the International World Wide WebConference (WWW), May 2006). However, other existing load balancingtechniques can be employed. Intuitively, when the algorithm has choices,it moves the new application instances (started by theplacement-changing subroutine) among machines to balance the load, whilekeeping the total satisfied demand and the number of placement changesthe same.

The placement algorithm deals with multiple optimization objectives. Inaddition to maximizing the total satisfied demand, it also strives tominimize placement changes, because they disturb the running system andwaste CPU cycles. In practice, many J2EE applications take a few minutesto start or stop, and take some additional time to warm up their datacache. The heuristic for reducing unnecessary placement changes is notto stop application instances whose load (in the load distributionmatrix L) is above certain threshold. We refer to them as pinnedinstances. The intuition is that, even if we stop these instances ontheir hosting machines, it is likely that we will start instances of thesame applications on other machines.

Each application m has its own pinning threshold w_(m) ^(pin). If thevalue of the threshold is too low, the algorithm may introduce manyunnecessary placement changes. If it is too high, the total satisfieddemand may be low due to insufficient placement changes. The algorithmcomputes the pinning thresholds for all the applications from theinformation gathered in a single dry-run invocation to theplacement-changing subroutine. The dry run pins no applicationinstances. After the dry run, the algorithm makes a second invocation tothe placement-changing subroutine, and requires pinning the applicationinstances whose load is higher than or equal to the pinning threshold ofthe corresponding application, i.e., L_(m,n)≧w_(m) ^(pin). The dry runand the second invocation use exactly the same inputs: the matrices Iand L produced by the load-shifting subroutine. Between the twoplacement solutions produced by the dry run and the second invocation,the algorithm picks as the final solution the one that has a highertotal satisfied demand. If the total satisfied demands are equal (e.g.,both solutions satisfy all the demands), it picks the one that has lessplacement changes.

Next, we describe how to compute the pinning threshold w_(m) ^(pin) foreach application m from the information gathered in the dry run.Intuitively, if the dry run starts a new application instance, then weshould not stop any instance of the same application whose load ishigher than or equal to that of the new instance. This is because thenew instance's load is considered sufficiently high by the dry run sothat it is even worthwhile to start a new instance. Let w_(m) ^(new)denote the minimum load assigned to a new instance of application m inthe dry run.

$\begin{matrix}{w_{m}^{new} = \begin{matrix}{\min \mspace{11mu} \left\{ {L_{m,n}\mspace{14mu} {after}\mspace{14mu} {the}\mspace{14mu} {dry}\mspace{14mu} {run}} \right\}} \\{I_{m,n} \in \left\{ {{new}\mspace{14mu} {instances}\mspace{14mu} {of}\mspace{14mu} {app}\mspace{14mu} m\mspace{14mu} {started}\mspace{14mu} {in}\mspace{14mu} {the}\mspace{14mu} {dry}\mspace{14mu} {run}} \right\}}\end{matrix}} & (13)\end{matrix}$

Here I_(m,n) represents a new instance of application m started onmachine n in the dry run. L_(m,n) is the load of this instance. Inaddition, the pinning threshold also depends the largest residual demandw*_(max) not satisfied in the dry run.

$\begin{matrix}{w_{\max}^{*} = \begin{matrix}{\max \mspace{14mu} w_{m}^{*}} \\{m \in \left\{ {{residual\_ app}{\_ tree}{\_ after}{\_ the}{\_ dry}{\_ run}} \right\}}\end{matrix}} & (14)\end{matrix}$

Here w*_(m) is the residual demand of application m after the dry run.We should not stop the application instances whose load is higher thanor equal to w*_(max). If we stop these instances, they will immediatelybecome the applications that we try to find a place to run. The pinningthreshold for application m is computed as follows.

w _(m) ^(pin)=max (1, min(w* _(max) , w _(m) ^(new)))   (15)

Because we do not want to pin completely idle application instances,Equation 15 stipulates that the pinning threshold w_(m) ^(pin) should beat least one CPU cycle per second.

It is to be appreciated that most of the computation time of theplacement algorithm is spent on solving the max-flow problem and themin-cost max-flow problem in FIG. 5. One example of an efficientalgorithm for solving the max-flow problem is the highest-labelpreflow-push algorithm (R. K. Ahuja, T. L. Magnanti, and J. B. Orlin,editors, “Network Flows: Theory, Algorithms, and Applications,” PrenticeHall, N.J., 1993, ISBN 1000499012), whose complexity is O(s²√t) where sis the number of nodes in the graph, and t is the number of edges in thegraph. One example of an efficient algorithm for solving the min-costflow problem is the enhanced capacity scaling algorithm (also see R. K.Ahuja, T. L. Magnanti, and J. B. Orlin, editors, “Network Flows: Theory,Algorithms, and Applications,” Prentice Hall, N.J., 1993, ISBN1000499012), whose complexity is O((s log t)(s+t log t)). Let N denotethe number of machines, and M denote the number of applications. Due tothe high memory requirement of J2EE applications, we assume that thenumber of applications that a machine can run is bounded by a constant.Therefore, in the network flow graph, both the number s of nodes and thenumber t of edges are bounded by O (N). The total number of applicationinstances in the entire system is also bounded by O (N). Under theseassumptions, the complexity of the placement algorithm is O(N^(2.5)).

FIG. 9 illustrates a graphical user interface that may be used tovisualize the real-time behavior of the placement algorithm executed byplacement controller 202 (FIG. 2).

FIG. 10 illustrates a computing system in accordance with which one ormore components/steps of the application placement system (e.g.,components and methodologies described in the context of FIGS. 2 through9) may be implemented, according to an embodiment of the presentinvention. It is to be understood that the individual components/stepsmay be implemented on one such computer system, or more preferably, onmore than one such computer system. In the case of an implementation ona distributed computing system, the individual computer systems and/ordevices may be connected via a suitable network, e.g., the Internet orWorld Wide Web. However, the system may be realized via private or localnetworks. The invention is not limited to any particular network.

Thus, the computing system shown in FIG. 10 may represent anillustrative architecture for a computing system associated withplacement controller 202 (FIG. 2). For example, the computing system inFIG. 10 may be the computing system that performs the algorithmfunctions illustrated in the context of FIGS. 4-8 (as well as anyapplicable steps discussed in the context of such figures). Also, thecomputing system in FIG. 10 may represent the computing architecture foreach of the machines (servers) upon which application instances areplaced. Still further, placement sensor 205, application demandestimator 206, configuration database 207, and placement executor 209,may be implemented on one or more such computing systems.

As shown, computing system 1000 may be implemented in accordance with aprocessor 1002, a memory 1004, I/O devices 1006, and a network interface1008, coupled via a computer bus 1010 or alternate connectionarrangement.

It is to be appreciated that the term “processor” as used herein isintended to include any processing device, such as, for example, onethat includes a CPU (central processing unit) and/or other processingcircuitry. It is also to be understood that the term “processor” mayrefer to more than one processing device and that various elementsassociated with a processing device may be shared by other processingdevices.

The term “memory” as used herein is intended to include memoryassociated with a processor or CPU, such as, for example, RAM, ROM, afixed memory device (e.g., hard drive), a removable memory device (e.g.,diskette), flash memory, etc.

In addition, the phrase “input/output devices” or “I/O devices” as usedherein is intended to include, for example, one or more input devices(e.g., keyboard, mouse, etc.) for entering data to the processing unit,and/or one or more output devices (e.g., speaker, display, etc.) forpresenting results associated with the processing unit. The graphicaluser interface of FIG. 9 may be implemented in accordance with such anoutput device.

Still further, the phrase “network interface” as used herein is intendedto include, for example, one or more transceivers to permit thecomputing system of FIG. 10 to communicate with another computing systemvia an appropriate communications protocol.

Accordingly, software components including instructions or code forperforming the methodologies described herein may be stored in one ormore of the associated memory devices (e.g., ROM, fixed or removablememory) and, when ready to be utilized, loaded in part or in whole(e.g., into RAM) and executed by a CPU.

Accordingly, illustrative principles of the invention provide manyadvantages over existing approaches, for example:

The placement algorithm is an online algorithm that, under multipleresource constraints, can efficiently produce high-quality solutions forhard placement problems with thousands of machines and thousands ofapplications. By “online,” it is meant that the algorithm has to solvethe placement problem in a short period of time, (e.g., seconds orminutes) because the other computers are waiting for the decision inreal time. By contrast, “offline” means that, we can run the algorithmfor hours, days, or even months to solve the problem. That is, nobody iswaiting for the result right away. This scalability is crucial fordynamic resource provisioning in large-scale enterprise data centers.

A load-lifting mechanism that makes later placement changes easier. Forexample, it co-locates different types of residual resources on the samemachines so that they can be used to start new application instances.

A mechanism to reduce the number of application starts and stops bypinning application instances that already deliver a sufficiently highload. The algorithm dynamically computes an appropriate pinningthreshold for every application through a dry run of making placementchanges.

A mechanism that does placement changes to the machines one by one in anisolated fashion. This strategy dramatically reduces the computationtime, and also helps reduce the number of placement changes. We furtheraddress the limitations of this isolation of machines throughmulti-round optimizations.

Although illustrative embodiments of the present invention have beendescribed herein with reference to the accompanying drawings, it is tobe understood that the invention is not limited to those preciseembodiments, and that various other changes and modifications may bemade by one skilled in the art without departing from the scope orspirit of the invention.

1. A method for determining an application instance placement in a setof machines under one or more resource constraints, the methodcomprising the steps of: computing an estimate of a value of a firstmetric that can be achieved by a current application instance placementand a current application load distribution; and determining a newapplication instance placement and a new application load distributionthat optimizes the first metric.
 2. The method of claim 1, wherein thedetermining step further comprises the new application instanceplacement improving upon the first metric and the new load distributionimproving upon a second metric.
 3. The method of claim 1, wherein thedetermining step further comprises shifting an application load.
 4. Themethod of claim 3, wherein the determining step further compriseschanging the application instance placement without pinning to determinea first candidate placement.
 5. The method of claim 4, wherein thedetermining step further comprises changing the application instanceplacement with pinning to determine a second candidate placement.
 6. Themethod of claim 5, wherein the determining step further comprisesselecting a best placement from the first candidate placement and thesecond candidate placement as the new application instance placement. 7.The method of claim 1, wherein the determining step is performedmultiple times.
 8. The method of claim 1, further comprising the step ofbalancing an application load across the set of machines.
 9. The methodof claim 1, wherein the first metric comprises a total number ofsatisfied demands.
 10. The method of claim 1, wherein the first metriccomprises a total number of placement changes.
 11. The method of claim1, wherein the first metric comprises an extent to which an applicationload is balanced across the set of machines.
 12. The method of claim 1,wherein one of the one or more resource constraints comprises aprocessing capacity.
 13. The method of claim 1, wherein one of the oneor more resource constraints comprises a memory capacity.
 14. The methodof claim 1, wherein the second metric comprises a degree of correlationbetween residual resources on each machine of the set of machines. 15.The method of claim 1, wherein the second metric comprises a number ofunderutilized application instances.
 16. Apparatus for determining anapplication instance placement in a set of machines under one or moreresource constraints, the apparatus comprising: a memory; and at leastone processor coupled to the memory and operative to: (i) compute anestimate of a value of a first metric that can be achieved by a currentapplication instance placement and a current application loaddistribution; and (ii) determine a new application instance placementand a new application load distribution that optimizes the first metric.17. An article of manufacture for determining an application instanceplacement in a set of machines under one or more resource constraints,comprising a machine readable medium containing one or more programswhich when executed implement the steps of: computing an estimate of avalue of a first metric that can be achieved by a current applicationinstance placement and a current application load distribution; anddetermining a new application instance placement and a new applicationload distribution that optimizes the first metric.